Answer Extraction for Definition Questions using Information Gain and Machine Learning
نویسندگان
چکیده
Extracting nuggets (pieces of an answer) is a very important process in question answering systems, especially in the case of definition questions. Although there are advances in nugget extraction, the problem is finding some general and flexible patterns that allow producing as many useful definition nuggets as possible. Nowadays, patterns are obtained in manual or automatic way and then these patterns are matched against sentences. In contrast to the traditional form of working with patterns, we propose a method using information gain and machine learning instead of matching patterns. We classify the sentences as likely to contain nuggets or not. Also, we analyzed separately in a sentence the nuggets that are left and right of the target term (the term to define). We performed different experiments with the collections of questions from the TREC 2002, 2003 and 2004 and the F-measures obtained are comparable with the participating systems.
منابع مشابه
A Machine Learning Approach to No-Reference Objective Video Quality Assessment for High Definition Resources
The video quality assessment must be adapted to the human visual system, which is why researchers have performed subjective viewing experiments in order to obtain the conditions of encoding of video systems to provide the best quality to the user. The objective of this study is to assess the video quality using image features extraction without using reference video. RMSE values and processing ...
متن کاملNTT's Question Answering System for NTCIR-6 QAC-4
NTCIR-6 QAC-4 organizers announced that there would be no restriction (such as factoid) on QAC4 questions, but they plan to include many ‘definition’ questions and ‘why’ questions. Therefore, we focused on these two question types. For ‘definition’ questions, we used a simple pattern-based approach. For ‘why’ questions, hand-crafted rules were used in previous work for answer candidate extracti...
متن کاملSimilarity measurement for describe user images in social media
Online social networks like Instagram are places for communication. Also, these media produce rich metadata which are useful for further analysis in many fields including health and cognitive science. Many researchers are using these metadata like hashtags, images, etc. to detect patterns of user activities. However, there are several serious ambiguities like how much reliable are these informa...
متن کاملConceptualization to Develop Machine Learning Techniques for Information Extraction: Consistency Queries
The information extraction from documents is an increasingly urgent problem of enterprise knowledge management. Knowledge sources may be internal like text files and forms of business administration processes or external like HTML pages, e.g. When the number of knowledge sources is paramount, substantial computer support is inevitable. Machine learning techniques play a crucial role. A prototyp...
متن کاملUsing dependency parsing and machine learning for factoid question answering on spoken documents
This paper presents our experiments in question answering for speech corpora. These experiments focus on improving the answer extraction step of the QA process. We present two approaches to answer extraction in question answering for speech corpora that apply machine learning to improve the coverage and precision of the extraction. The first one is a reranker that uses only lexical information,...
متن کامل